Local Monotonic Attention Mechanism for End-to-End Speech Recognition

نویسندگان

Andros Tjandra

Sakriani Sakti

Satoshi Nakamura

چکیده

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P and machine translation between two languages with similar sentence structures, demonstrate that the proposed encoderdecoder model with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing

متن کامل

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Recurrent neural network models with an attention mechanism have proven to be extremely effective on a wide variety of sequence-tosequence problems. However, the fact that soft attention mechanisms perform a pass over the entire input sequence when producing each element in the output sequence precludes their use in online settings and results in a quadratic time complexity. Based on the insigh...

متن کامل

End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition

End-to-End speech recognition is a recently proposed approach that directly transcribes input speech to text using a single model. End-to-End speech recognition methods including Connectionist Temporal Classification and Attention-based Encoder Decoder Networks have been shown to obtain state-ofthe-art performance on a number of tasks and significantly simplify the modeling, training and decodi...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition

Recently end-to-end speech recognition has obtained much attention. One of the popular models to achieve end-to-end speech recognition is attention based encoder-decoder model, which usually generating output sequences iteratively by attending the whole representations of the input sequences. However, predicting outputs until receiving the whole input sequence is not practical for online or low...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1705.08091 شماره

صفحات -

تاریخ انتشار 2017

Local Monotonic Attention Mechanism for End-to-End Speech Recognition

نویسندگان

چکیده

منابع مشابه

Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing

Online and Linear-Time Attention by Enforcing Monotonic Alignments

End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition

Speech Emotion Recognition Using Scalogram Based Deep Structure

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition

عنوان ژورنال:

اشتراک گذاری